The Impact of Noise in Web Genre Identification

نویسندگان

  • Dimitrios A. Pritsos
  • Efstathios Stamatatos
چکیده

Genre detection of web documents fits an open-set classification task. The web documents not belonging to any predefined genre or where multiple genres co-exist is considered as noise. In this work we study the impact of noise on automated genre identification within an open-set classification framework. We examine alternative classification models and document representation schemes based on two corpora, one without noise and one with noise showing that the recently proposed RFSE model can remain robust with noise. Moreover, we show how that the identification of certain genres is not practically affected by the presence of noise.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effects of Exposure to Impact Noise on the Hearing of Armed Forces and Evaluation of the Methods to Control and Decrease its Consequences: A Review Study

Background and Aim: Exposure to impact noise (short-term, high intensity) higher than permitted levels results in injury to the auditory system. Armed forces are one of the occupational groups exposed to these types of noises resulting from gunshots. In this study, relevant articles and research on the adverse effects of impact noise, hearing loss, and tinnitus in armed forces and effective con...

متن کامل

The Impact of Linear Process versus Genre-Based Approach on Intermediate EFL Learners’ Accuracy in Written Task Performance

The main purpose of the present quasi-experimental study was to investigate the effects of linear process versus genre-based approach on EFL learners’ written production. To this end, 40 learners of English at intermediate level were randomly selected as the participants of the study and assigned into two groups of experimental (process and genre) which received different types of instruction f...

متن کامل

Implementing a Characterization of Genre for Automatic Genre Identification of Web Pages

In this paper, we propose an implementable characterization of genre suitable for automatic genre identification of web pages. This characterization is implemented as an inferential model based on a modified version of Bayes’ theorem. Such a model can deal with genre hybridism and individualization, two important forces behind genre evolution. Results show that this approach is effective and is...

متن کامل

Improving Data-based Wind Turbine Using Measured Data Foggy Method

The purpose of this paper is to improve the modeling of the data-driven wind turbine system that receives data from noise signals. Most of the data on industrial systems is noisely and data noise is inevitable and natural. The method and idea proposed in this paper, Data Fogging, significantly reduce the impact of noise on data-driven wind turbine system modeling, which is the basis of this met...

متن کامل

Impact of Genre-Based Instruction on Development of Students’ Letter Writing Skills: The Case of Students of Textile Engineering

The current study investigated the effectiveness of genre-based instruction on the development of EFL learners’ writing skills. Participants were 34 undergraduate students majoring in textile engineering at an Iranian state university, and they had enrolled in the English for specific academic purposes course. Participants were taught how to write 4 types of business letters, highlighting the p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015